-
Notifications
You must be signed in to change notification settings - Fork 105
Reorg and new custom decoder support #177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
oschwald
commented
Jun 22, 2025
- Move decoder code into new internal package
- Add change log
- Make functional option functions return a function
- Add support for options to Open and FromBytes
- Add missing docs
- Tighten up the golangci-lint config
- Start decoupling the decoding and relection
- Move size checks to DataDecoder
- Improve naming of data types and make them public
- Add separate Decoder type for manual decoding
- Add support for UnmarshalMaxMinDB
This will be used in the future for things like cache configuration. Adding now as it is a breaking change.
Although this doesn't provide immediate benefit, the intent is to make the data decoder available separately in a future commit for public use.
This further decouples the code. There are probably future improvements for reducing redundancy however.
This is largely based on #91.
Extend the UnmarshalMaxMindDB interface to work recursively with nested types, matching the behavior of encoding/json's UnmarshalJSON. Custom unmarshalers are now called for: - Struct fields that implement Unmarshaler - Pointer fields (creates value if nil, then checks for Unmarshaler) - Slice elements that implement Unmarshaler - Map values that implement Unmarshaler This enhancement allows for more flexible custom decoding strategies in complex data structures, improving performance optimization opportunities for nested types.
Add PeekType() method to Decoder that returns the type of the current value without consuming it, similar to jsontext.Decoder.PeekKind(). This enables look-ahead parsing for conditional decoding logic. The method follows pointers to return the actual data type being pointed to rather than just returning TypePointer.
652ee9d
to
9e78231
Compare
The Decoder and Unmarshaler types are now available in the mmdbdata package for applications that need direct access to the decoding API.
Renames Type* constants to Kind* and PeekType() to PeekKind() throughout the codebase to match Go's encoding/json/v2 naming conventions. This improves API consistency with the standard library.
Renames all Decoder methods from Decode* to Read* (e.g., DecodeString to ReadString) to match Go's encoding/json/v2 jsontext.Decoder naming conventions. This improves API consistency with the standard library.
Adds DecoderOption type and variadic options parameter to enable future configuration without breaking API changes. Follows existing library patterns like ReaderOption and NetworksOption.
Improved error messages to include byte offset information and, for the reflection-based API, path information for nested structures using JSON Pointer format. For example, errors may now show "at offset 1234, path /city/names/en" or "at offset 1234, path /list/0/name" instead of just the underlying error message. The implementation maintains zero allocation on the happy path through retroactive path building during error unwinding.
Replaces unbounded cache with fixed 512-entry array using offset-based indexing. Provides 15% performance improvement while preventing memory growth and ensuring thread safety for concurrent reader usage.
Move size validation logic to DataDecoder to eliminate duplication and create single source of truth for data validation.
And make them more accurate.
- Add comprehensive field documentation to Metadata struct with references to MaxMind DB specification - Add BuildTime() convenience method to convert BuildEpoch to time.Time - Enhance Verify() documentation explaining validation scope and use cases - Add ExampleReader_Verify showing verification and metadata access - Add TestMetadataBuildTime to verify BuildTime() method correctness
Add makeTestName helper function to create reasonable test names from long hex strings. TestDecodeByte and TestDecodeString now show concise names like '9e06b37878787878...7878' instead of extremely long hex strings that made test output unreadable.
Remove the experimental deserializer interface and all supporting code: - Delete deserializer.go interface definition - Delete deserializer_test.go test file - Remove deserializer support from reflection.go - Remove deserializer methods from data_decoder.go - Remove unused math/big import from data_decoder.go - Add breaking change notice to changelog recommending UnmarshalMaxMindDB
Error messages for type mismatches now display readable type names like 'Map' and 'Slice' instead of numeric codes, making debugging easier.
Make StringCache, buffer access, InternAt, and all DataDecoder methods package-private since they are only used within the decoder package. Keeps Kind methods public as they are exposed via mmdbdata type alias.
Replaces global mutex with per-entry mutexes to reduce allocation count from 33 to 10 per operation in downstream libraries while maintaining thread safety and good concurrent performance.
Add BenchmarkCityLookupConcurrent to demonstrate string cache performance improvements under concurrent load. Tests 1, 4, 16, and 64 goroutines performing realistic city lookups, providing clear metrics for concurrent scaling behavior.
Use direct type assertion instead of reflection-based interface checking in Decode method for better performance.
Replace nodeReader interface with specialized traverseTree functions for each record size. Eliminates interface dispatch overhead and implements branchless offset calculations for improved performance.
Replace reflect.Type.Implements() with type assertion using comma-ok idiom and remove redundant pointer interface check. The recursive decode call handles pointer receiver implementations via CanAddr(). Eliminates unmarshalerType variable and reduces code complexity while maintaining identical functionality and performance.
Remove inaccurate 15% performance improvement claim that was contradicted by benchmark testing. Add missing BREAKING CHANGE label for network options API changes.
This commit implements encoding/json/v2 style field precedence rules for struct field resolution, replacing the previous two-phase processing with a single-phase approach that properly handles field conflicts using depth-based and tag-based precedence. Key changes: - Replace fieldsType.anonymousFields with fieldInfo metadata structure - Implement breadth-first traversal for field collection with depth tracking - Add support for embedded pointer types (*EmbeddedStruct) - Apply json/v2 precedence rules: shallow beats deep, tagged beats untagged - Use single-phase processing with FieldByIndex for embedded field access - Initialize nil embedded pointers during field traversal Precedence rules applied: 1. Shallowest embedding depth wins 2. Among same depth, explicitly tagged field wins over untagged 3. Among same depth and tag status, first declared wins Fixes embedded pointer field access that was causing nil pointer dereferences in complex nested structures.
Adds basic validation for maxminddb struct tags inspired by encoding/json/v2's tag validation approach. Currently validates: - UTF-8 encoding of tag values - Provides foundation for future tag validation improvements The validation is designed to be non-intrusive - validation errors are currently ignored to maintain backward compatibility, but the infrastructure is in place for future enhancements. This follows the json/v2 pattern of catching obvious user mistakes while being permissive about edge cases that might be legitimate.
Implement field index reindexing and addressable value wrapper to reduce reflection overhead. Split field indices for faster access and eliminate redundant bounds checks during field traversal. Based on encoding/json/v2 optimizations for better performance in struct field access patterns.
Optimizes the reflection-based decoder with more efficient field access patterns and reduced memory allocations. Eliminates duplicate code paths and unused functions. Performance improvement measured in geoip2-golang benchmarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR reorganizes the decoder implementation into internal packages, adds a new custom decoding API, and tightens error handling and configuration.
- Moved all decoding logic into
internal/decoder
, exposing a high-leveldecoder.ReflectionDecoder
and a publicmmdbdata.Decoder
factory. - Replaced legacy error functions with the
internal/mmdberrors
package for structured, contextual errors. - Refactored options to use functional option patterns and updated APIs (
Reader.Open/FromBytes
,Networks
,Verify
,Result.Decode
), plus added missing docs, tests, and benchmarks.
Reviewed Changes
Copilot reviewed 37 out of 38 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
verifier.go | Switched to use ReflectionDecoder.VerifyDataSection and mmdberrors for errors. |
traverse_test.go | Updated test to call network options as functions (IncludeAliasedNetworks() ). |
traverse.go | Refactored NetworksOption funcs to return closures and tree traversal adjustments. |
result.go | Simplified Decode /DecodePath to delegate to decoder.ReflectionDecoder . |
reader_test.go | Fixed literal types, updated closed-DB error messages, added concurrent benchmark. |
reader.go | Overhauled Reader to use ReflectionDecoder , introduced ReaderOption , enriched docs. |
node.go | Removed – replaced by readNodeBySize in traverse.go . |
mmdbdata/type.go | Added public type and factory aliases for custom decoding. |
mmdbdata/interface.go | Introduced Unmarshaler interface for custom unmarshaling. |
mmdbdata/doc.go | Added package documentation for mmdbdata . |
internal/mmdberrors/errors.go | Defined InvalidDatabaseError and UnmarshalTypeError types. |
internal/mmdberrors/context.go | Added ContextualError , WrapWithContext , and PathBuilder . |
internal/decoder/verifier.go | Moved verifyDataSection implementation into ReflectionDecoder . |
internal/decoder/* | Moved core decoding, reflection logic, caching, and extensive tests under internal/decoder . |
Comments suppressed due to low confidence (5)
reader.go:107
- The code uses errors.New but the "errors" package is not imported. Add
"errors"
to the import block.
import (
reader.go:331
- This call to
errors.New
will not compile because theerrors
package isn’t imported. Ensureerrors
is imported.
return Result{err: errors.New("cannot call Lookup on a closed database")}
reader.go:362
- This call to
errors.New
also requires importing theerrors
package. Add the import to fix the compile error.
return Result{err: errors.New("cannot call LookupOffset on a closed database")}
reader_test.go:1033
- You can’t range over an integer. Replace with a standard for-loop, e.g.,
for i := 0; i < numGoroutines; i++ {
.
for range numGoroutines {
reader_test.go:1042
- Likewise,
lookupsPerGoroutine
is an integer, not a slice. Usefor i := 0; i < lookupsPerGoroutine; i++ {
instead.
for range lookupsPerGoroutine {
type Result struct { | ||
ip netip.Addr | ||
err error | ||
decoder decoder | ||
decoder decoder.ReflectionDecoder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The field name decoder
shadows the imported package name decoder
. Consider renaming the field (e.g., refDecoder
) to improve clarity.
decoder decoder.ReflectionDecoder | |
refDecoder decoder.ReflectionDecoder |
Copilot uses AI. Check for mistakes.
Update ReadMap and ReadSlice to return collection size along with iterators, enabling efficient pre-allocation of maps and slices. Iterator remains the primary return value for natural usage patterns.